skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Li, Zheng"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available September 1, 2026
  2. Free, publicly-accessible full text available May 1, 2026
  3. Free, publicly-accessible full text available December 1, 2025
  4. Free, publicly-accessible full text available March 21, 2026
  5. Training a machine learning model with data following a meaningful order, i.e., from easy to hard, has been proven to be effective in accelerating the training process and achieving better model performance. The key enabling technique is curriculum learning (CL), which has seen great success and has been deployed in areas like image and text classification. Yet, how CL affects the privacy of machine learning is unclear. Given that CL changes the way a model memorizes the training data, its influence on data privacy needs to be thoroughly evaluated. To fill this knowledge gap, we perform the first study and leverage membership inference attack (MIA) and attribute inference attack (AIA) as two vectors to quantify the privacy leakage caused by CL. Our evaluation of 9 real-world datasets with attack methods (NN-based, metric-based, label-only MIA, and NN-based AIA) revealed new insights about CL. First, MIA becomes slightly more effective when CL is applied, but the impact is much more prominent to a subset of training samples ranked as difficult. Second, a model trained under CL is less vulnerable under AIA, compared to MIA. Third, the existing defense techniques like MemGuard and MixupMMD are not effective under CL. Finally, based on our insights into CL, we propose a new MIA, termed Diff-Cali, which exploits the difficulty scores for result calibration and is demonstrated to be effective against all CL methods and the normal training method. With this study, we hope to draw the community's attention to the unintended privacy risks of emerging machine-learning techniques and develop new attack benchmarks and defense solutions. 
    more » « less
    Free, publicly-accessible full text available January 1, 2026
  6. Chiruzzo, Luis; Ritter, Alan; Wang, Lu (Ed.)
    The instruction hierarchy, which establishes a priority order from system messages to user messages, conversation history, and tool outputs, is essential for ensuring consistent and safe behavior in language models (LMs). Despite its importance, this topic receives limited attention, and there is a lack of comprehensive benchmarks for evaluating models’ ability to follow the instruction hierarchy. We bridge this gap by introducing IHEval, a novel benchmark comprising 3,538 examples across nine tasks, covering cases where instructions in different priorities either align or conflict. Our evaluation of popular LMs highlights their struggle to recognize instruction priorities. All evaluated models experience a sharp performance decline when facing conflicting instructions, compared to their original instruction-following performance. Moreover, the most competitive open-source model only achieves 48% accuracy in resolving such conflicts. Our results underscore the need for targeted optimization in the future development of LMs. 
    more » « less
    Free, publicly-accessible full text available April 27, 2026
  7. Free, publicly-accessible full text available December 2, 2025
  8. Abstract Delineation of microbial habitats within the soil matrix and characterization of their environments and metabolic processes are crucial to understand soil functioning, yet their experimental identification remains persistently limited. We combined single- and triple-energy X-ray computed microtomography with pore specific allocation of13C labeled glucose and subsequent stable isotope probing to demonstrate how long-term disparities in vegetation history modify spatial distribution patterns of soil pore and particulate organic matter drivers of microbial habitats, and to probe bacterial communities populating such habitats. Here we show striking differences between large (30-150 µm Ø) and small (4-10 µm Ø) soil pores in (i) microbial diversity, composition, and life-strategies, (ii) responses to added substrate, (iii) metabolic pathways, and (iv) the processing and fate of labile C. We propose a microbial habitat classification concept based on biogeochemical mechanisms and localization of soil processes and also suggests interventions to mitigate the environmental consequences of agricultural management. 
    more » « less
  9. Abstract BackgroundBreast cancer poses a significant health risk to women worldwide, with approximately 30% being diagnosed annually in the United States. The identification of cancerous mammary tissues from non-cancerous ones during surgery is crucial for the complete removal of tumors. ResultsOur study innovatively utilized machine learning techniques (Random Forest (RF), Support Vector Machine (SVM), and Convolutional Neural Network (CNN)) alongside Raman spectroscopy to streamline and hasten the differentiation of normal and late-stage cancerous mammary tissues in mice. The classification accuracy rates achieved by these models were 94.47% for RF, 96.76% for SVM, and 97.58% for CNN, respectively. To our best knowledge, this study was the first effort in comparing the effectiveness of these three machine-learning techniques in classifying breast cancer tissues based on their Raman spectra. Moreover, we innovatively identified specific spectral peaks that contribute to the molecular characteristics of the murine cancerous and non-cancerous tissues. ConclusionsConsequently, our integrated approach of machine learning and Raman spectroscopy presents a non-invasive, swift diagnostic tool for breast cancer, offering promising applications in intraoperative settings. 
    more » « less
    Free, publicly-accessible full text available December 1, 2025
  10. A longstanding question in plant evolution is why ferns have many more chromosomes than angiosperms. The leading hypothesis proposes that ferns have ancient polyploidy without chromosome loss or gene deletion to explain the high chromosome numbers of ferns. Here, we test this hypothesis by estimating ancient polyploidy frequency, chromosome evolution, protein evolution in meiosis genes, and patterns of gene retention in ferns. We found similar rates of paleopolyploidy in ferns and angiosperms from independent phylogenomic and chromosome number evolution analyses, but lower rates of chromosome loss in ferns. We found elevated evolutionary rates in meiosis genes in angiosperms, but not in ferns. Finally, we found some evidence of parallel and biased gene retention in ferns, but this was comparatively weak to patterns in angiosperms. This work provides genomic evidence supporting a decades-old hypothesis on fern genome evolution and provides a foundation for future work on plant genome structure. 
    more » « less